Skip to content

Conversation

@sanikolaev
Copy link
Collaborator

Related issue #3928

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue with sorting by string attributes when using the OPTION scroll feature (issue #3928). The fix addresses how string attributes are handled in scroll sorters by properly mapping them to their internal STRINGPTR representations used for sorting.

  • Adds proper handling for string attribute sorting with scroll tokens by mapping SPH_ATTR_STRING to SPH_ATTR_STRINGPTR
  • Implements correct memory management for allocated string pointers with tracking and cleanup
  • Fixes the Reset() call to use GetDynamicSize() instead of GetRowSize() to properly allocate only dynamic attributes

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
test/test_308/test.xml Adds comprehensive test cases for string sorting with scroll (both ASC and DESC) using both SphinxQL and JSON API
test/test_308/model.bin Updates expected test results to include brand_name field in outputs and validate string scroll behavior
test/test_307/model.bin Fixes expected results for existing string sorting test - now correctly returns only rows after scroll position (1 instead of 5)
src/sorterscroll.cpp Implements core fix: remaps string attributes to STRINGPTR for scroll comparison, adds pointer tracking, and fixes memory management

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sanikolaev sanikolaev force-pushed the scroll-issue branch 4 times, most recently from 4a4bd4e to f996bdd Compare November 16, 2025 12:57
@sanikolaev
Copy link
Collaborator Author

Notes after the fix

Problem

When using OPTION scroll for pagination with ORDER BY clauses that include string columns, Manticore Search ignored the scroll token and consistently returned the first page of results instead of subsequent pages.

Root Cause

String attributes in Manticore are internally remapped for sorting purposes:

  • User-facing attribute: SPH_ATTR_STRING (e.g., brand_name)
  • Internal sorting attribute: SPH_ATTR_STRINGPTR with name @int_attr_brand_name

The scroll sorter was not aware of this remapping, so when it tried to:

  1. Look up the attribute by the original name (brand_name)
  2. Check if it's a supported type for scrolling
  3. Set up the reference match for comparison

It would find SPH_ATTR_STRING (not in the supported types list) or fail to properly initialize the reference match, causing the scroll token to be ignored.

Solution Approach

1. Attribute Remapping Logic

Added remapping detection in two key places:

SetupRefMatch(): When setting up the reference match from the scroll token, if the token specifies a string type (SPH_ATTR_STRINGPTR) but the schema has SPH_ATTR_STRING, we look for the remapped @int_attr_* version and use that instead.

CanCreateScrollSorter(): Similarly, when checking if a scroll sorter can be created, we check for the remapped version to correctly identify string attributes as supported.

This ensures the scroll sorter works with the same internal representation that the sorting system uses.

2. Memory Management Fix

Problem: The original code tracked allocated string pointers by storing attribute info pointers, then later tried to retrieve the actual pointer values from m_tRefMatch using GetAttr(). This caused use-after-free crashes when m_tRefMatch was reset or destroyed.

Solution: Changed to store the actual BYTE* pointer values directly when allocating with sphPackPtrAttr(). This way:

  • We don't need to access m_tRefMatch when freeing (avoiding use-after-free)
  • We have direct ownership of the pointers we allocated
  • Freeing is straightforward: iterate through stored pointers and deallocate

3. Memory Allocation Fix

Changed m_tRefMatch.Reset(pSchema->GetRowSize()) to GetDynamicSize() because:

  • GetRowSize() includes both dynamic and static parts
  • GetDynamicSize() returns only the dynamic part size needed for Reset()
  • This ensures proper memory allocation for dynamic attributes

Key Design Decisions

  1. Remapping at scroll sorter level: Rather than changing the internal remapping system, we handle it transparently in the scroll sorter. This is safer and doesn't affect other parts of the codebase.

  2. Direct pointer tracking: Storing pointer values directly rather than attribute info avoids the need to access potentially invalid m_tRefMatch state, making memory management more robust.

  3. Null checks: Added proper null checks for attributes that might not exist in the schema, making the code more defensive.

Testing

The fix was tested with:

  • String sorting in both ASC and DESC order
  • Combined sorting (string + integer attributes)
  • Multiple pages of scroll pagination
  • Existing scroll tests for non-string attributes

Copy link
Contributor

@glookka glookka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, it is difficult to evaluate if the changes are correct without checking out, building and running tests manually. All I see in that PR is that the model was changed in test_307. Why? What changed?

if ( i.m_eType==SPH_ATTR_STRINGPTR && pAttr->m_eAttrType==SPH_ATTR_STRING )
{
CSphString sRemappedName;
sRemappedName.SetSprintf ( "@int_attr_%s", i.m_sSortAttr.cstr() );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's GetInternalAttrPrefix() for that.

{
case SPH_ATTR_STRINGPTR:
sphSetRowAttr ( pRowData, pAttr->m_tLocator, (SphAttr_t)sphPackPtrAttr ( { (const BYTE*)i.m_sValue.cstr(), i.m_sValue.Length() } ) );
// After remapping, pAttr should be SPH_ATTR_STRINGPTR, but check to be safe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what asserts are for

BYTE * pPacked = sphPackPtrAttr ( { (const BYTE*)i.m_sValue.cstr(), i.m_sValue.Length() } );
sphSetRowAttr ( pRowData, pAttr->m_tLocator, (SphAttr_t)pPacked );
// Track the allocated pointer value directly
m_dAllocatedPtrs.Add(pPacked);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the m_dAllocatedPtrs array? We have the sorter schema, we know which attrs were potentially allocated (sphIsDataPtrAttr()), so we can just loop through them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 2, but it now leads to a crash. As discussed, it makes sense for you to look into this yourself.

Eliminated the m_dAllocatedPtrs vector and adjusted memory deallocation logic in FreeDataPtrAttrs to directly free pointers from m_tRefMatch. This simplifies memory management and ensures proper cleanup of allocated data pointers.
@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

Linux debug test results

1 140 tests   1 082 ✅  52m 43s ⏱️
    1 suites     54 💤
    1 files        4 ❌

For more details on these failures, see this check.

Results for commit 2d158a0.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

Linux release test results

1 140 tests   1 082 ✅  14m 57s ⏱️
    1 suites     54 💤
    1 files        4 ❌

For more details on these failures, see this check.

Results for commit 2d158a0.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

clt

❌ CLT tests in test/clt-tests/sharding/cluster/ test/clt-tests/sharding/functional/ test/clt-tests/sharding/replication/ test/clt-tests/sharding/syntax/
✅ OK: 18
❌ Failed: 1
⏳ Duration: 289s
👉 Check Action Results for commit 6cdd434

Failed tests:

🔧 Edit failed tests in UI:

test/clt-tests/sharding/cluster/test-drop-sharded-clustering-table.rec
––– input –––
export INSTANCE=1
––– output –––
OK
––– input –––
mkdir -p /var/{run,lib,log}/manticore-${INSTANCE}
––– output –––
OK
––– input –––
stdbuf -oL searchd -c test/clt-tests/base/searchd-with-flexible-ports.conf > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore-${INSTANCE}/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore-${INSTANCE}/searchd.log; fi
––– output –––
OK
––– input –––
export INSTANCE=2
––– output –––
OK
––– input –––
mkdir -p /var/{run,lib,log}/manticore-${INSTANCE}
––– output –––
OK
––– input –––
stdbuf -oL searchd -c test/clt-tests/base/searchd-with-flexible-ports.conf > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore-${INSTANCE}/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore-${INSTANCE}/searchd.log; fi
––– output –––
OK
––– input –––
export CLUSTER_NAME=c
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "create cluster ${CLUSTER_NAME}"
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "show status like 'cluster_${CLUSTER_NAME}_status'\G"
––– output –––
OK
––– input –––
for n in `seq 2 $INSTANCE`; do mysql -h0 -P${n}306 -e "join cluster ${CLUSTER_NAME} at '127.0.0.1:1312'"; done;
––– output –––
OK
––– input –––
mysql -h0 -P${INSTANCE}306 -e "show status like 'cluster_${CLUSTER_NAME}_status'\G"
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "create table ${CLUSTER_NAME}:tbl1(id bigint) shards='3' rf='2';"; echo $?;
––– output –––
OK
––– input –––
echo "=== Node 1306 ==="; mysql -h0 -P1306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 1306 failed!"; echo "=== Node 2306 ==="; mysql -h0 -P2306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 2306 failed!"
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "DROP TABLE ${CLUSTER_NAME}:tbl1;"; echo $?;
––– output –––
OK
––– input –––
echo "=== Node 1306 ==="; mysql -h0 -P1306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 1306 failed!"; echo "=== Node 2306 ==="; mysql -h0 -P2306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 2306 failed!"
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "create table ${CLUSTER_NAME}:Tbl2(id bigint) shards='3' rf='1';"; echo $?
––– output –––
OK
––– input –––
echo "=== Node 1306 ==="; mysql -h0 -P1306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 1306 failed!"; echo "=== Node 2306 ==="; mysql -h0 -P2306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 2306 failed!"
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "DROP TABLE ${CLUSTER_NAME}:Tbl2;"; echo $?
––– output –––
OK
––– input –––
echo "=== Node 1306 ==="; mysql -h0 -P1306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 1306 failed!"; echo "=== Node 2306 ==="; mysql -h0 -P2306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 2306 failed!"
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "create table ${CLUSTER_NAME}:tbl_missing_type(id) shards='3' rf='1';"
––– output –––
OK
––– input –––
echo "=== Node 1306 ==="; mysql -h0 -P1306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 1306 failed!"; echo "=== Node 2306 ==="; mysql -h0 -P2306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 2306 failed!"
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "DROP TABLE ${CLUSTER_NAME}:tbl_missing_type;"; echo $?
––– output –––
OK
––– input –––
echo "=== Node 1306 ==="; mysql -h0 -P1306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 1306 failed!"; echo "=== Node 2306 ==="; mysql -h0 -P2306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 2306 failed!"
––– output –––
OK
––– input –––
LONG_TABLE_NAME=$(printf "tbl%065d" 1)
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "create table ${CLUSTER_NAME}:${LONG_TABLE_NAME}(id bigint) shards='3' rf='1';"
––– output –––
OK
––– input –––
echo "=== Node 1306 ==="; mysql -h0 -P1306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 1306 failed!"; echo "=== Node 2306 ==="; mysql -h0 -P2306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 2306 failed!"
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "DROP TABLE ${CLUSTER_NAME}:${LONG_TABLE_NAME};"
––– output –––
+ ERROR 1064 (42000) at line 1: Waiting timeout exceeded.
––– input –––
echo "=== Node 1306 ==="; mysql -h0 -P1306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 1306 failed!"; echo "=== Node 2306 ==="; mysql -h0 -P2306 -e "SHOW TABLES\G" | sed 's/^[[:space:]]*//' || echo "Node 2306 failed!"
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "DROP TABLE ${CLUSTER_NAME}:nonexistent_table;"
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "DROP TABLE nonexistent_cluster:tbl1;"
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "DROP TABLE ${CLUSTER_NAME}:tbl1;"
––– output –––
OK
––– input –––
mysql -h0 -P1306 -e "INSERT INTO ${CLUSTER_NAME}:tbl1 VALUES (1);" & sleep 1; mysql -h0 -P1306 -e "DROP TABLE ${CLUSTER_NAME}:tbl1;"
––– output –––
OK

@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

Windows test results

    3 files      3 suites   29m 2s ⏱️
1 116 tests 1 055 ✅ 54 💤 7 ❌
1 118 runs  1 057 ✅ 54 💤 7 ❌

For more details on these failures, see this check.

Results for commit 2d158a0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants